Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuda] [test] Add TI_DEVICE_MEMORY_GB and TI_DEVICE_MEMORY_FRACTION environment variable #769

Merged
merged 11 commits into from
Apr 26, 2020

Conversation

archibate
Copy link
Collaborator

@archibate archibate requested a review from yuanming-hu April 13, 2020 15:30
@archibate
Copy link
Collaborator Author

archibate commented Apr 13, 2020

There is still some crashes, however, should I turn to 1 / (threads + 3)?
Btw, what's the most memory-eating test? How much it wants?

@archibate archibate changed the title [CUDA] fix out of memory when test with multi-threading on small GPU memory [CUDA] fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var Apr 14, 2020
@archibate
Copy link
Collaborator Author

fatal: unable to access 'https://github.com/taichi-dev/taichi.git/': Could not resolve host: github.com
Seems CI fail due to github.com was suffering a ddos attack?

@archibate
Copy link
Collaborator Author

Hello? Travis failed is due to mysterious reasons. I think we had better merge this before getting conflicts...

@yuanming-hu
Copy link
Member

I'll take care of this later today since I have too many meetings during the day. Sorry!

Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please address the issues accordingly.

cmake/TaichiCore.cmake Outdated Show resolved Hide resolved
python/taichi/main.py Outdated Show resolved Hide resolved
@archibate
Copy link
Collaborator Author

Found unexpected error other than oom:

tests/python/test_ad_basics.py::test_atan2_f64 Running test on arch=Arch.cuda
arch=Arch.cuda default_fp=DataType.float64
[E 04/22/20 09:55:23.427] [llvm_context.cpp:operator()@49] LLVM Fatal Error: Cannot select: intrinsic %llvm.nvvm.atomic.load.add.f64

@archibate archibate changed the title [CUDA] fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var [CUDA] [test] fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var Apr 22, 2020
@archibate archibate changed the title [CUDA] [test] fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var [cuda] [test] fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var Apr 22, 2020
@archibate archibate requested a review from yuanming-hu April 22, 2020 02:40
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry I'm getting increasingly occupied in the mornings, so please do not stay up and wait for my responses....Thanks!)

python/taichi/main.py Outdated Show resolved Hide resolved
@yuanming-hu yuanming-hu changed the title [cuda] [test] fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var [cuda] [test] Fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var Apr 23, 2020
@archibate archibate force-pushed the cuda branch 2 times, most recently from 920290b to 65d5cbe Compare April 25, 2020 04:39
@archibate archibate requested a review from yuanming-hu April 25, 2020 05:11
python/taichi/lang/__init__.py Outdated Show resolved Hide resolved
@archibate archibate changed the title [cuda] [test] Fix multi-threading OOM on small GPU and add TI_DEVICE_MEMORY_GB env var [cuda] [test] Add TI_DEVICE_MEMORY_GB and TI_DEVICE_MEMORY_FRACTION environment variable Apr 26, 2020
@archibate
Copy link
Collaborator Author

proof-read your code 3-5 times

Read-proof!

@archibate archibate requested a review from yuanming-hu April 26, 2020 04:30
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@archibate archibate merged commit 42c19c7 into taichi-dev:master Apr 26, 2020
archibate added a commit that referenced this pull request Apr 26, 2020
…ACTION environment variable (#769)"

This reverts commit 42c19c7.
archibate added a commit to archibate/taichi that referenced this pull request Apr 26, 2020
archibate added a commit that referenced this pull request Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants